ROSSMANN Stores Sales Predictions¶

Summary¶

  1. Context
  2. Challenge
  3. Solution Development
  4. Conclusion and Demonstration
  5. Next Steps

1. Context¶

  • Monthly Results Meeting
  • CFO asked for a Sales Forecast for the Next 6 Weeks for each Store

2. Challenge¶

Problem¶

  • Budget definition for store renovation

Causes¶

  • Current sales forecasts were highly inconsistent.

  • The sales forecast process is based on past experiences.

  • All sales forecasts are made manually by Rossmann's 1,115 stores.

  • Sales viewing is limited to computers.

Solution¶

  • Using Machine Learning to forecast sales for all stores

  • Sales predictions can be viewed on a smartphone.

3. Solution Development¶

Data Description¶

Data Dimension¶

Number of Rows 1017209
Number of Cols 18

Descriptive Statistics¶

attributes min max range mean median std skew kurtosis
0 store 1.0 1115.0 1114.0 558.429727 558.0 321.908493 -0.000955 -1.200524
1 day_of_week 1.0 7.0 6.0 3.998341 4.0 1.997390 0.001593 -1.246873
2 sales 0.0 41551.0 41551.0 5773.818972 5744.0 3849.924283 0.641460 1.778375
3 customers 0.0 7388.0 7388.0 633.145946 609.0 464.411506 1.598650 7.091773
4 open 0.0 1.0 1.0 0.830107 1.0 0.375539 -1.758045 1.090723
5 promo 0.0 1.0 1.0 0.381515 0.0 0.485758 0.487838 -1.762018
6 school_holiday 0.0 1.0 1.0 0.178647 0.0 0.383056 1.677842 0.815154
7 competition_distance 20.0 200000.0 199980.0 5935.442677 2330.0 12547.646829 10.242344 147.789712
8 competition_open_since_month 1.0 12.0 11.0 6.786849 7.0 3.311085 -0.042076 -1.232607
9 competition_open_since_year 1900.0 2015.0 115.0 2010.324840 2012.0 5.515591 -7.235657 124.071304
10 promo2 0.0 1.0 1.0 0.500564 1.0 0.500000 -0.002255 -1.999999
11 promo2_since_week 1.0 52.0 51.0 23.619033 22.0 14.310057 0.178723 -1.184046
12 promo2_since_year 2009.0 2015.0 6.0 2012.793297 2013.0 1.662657 -0.784436 -0.210075
13 is_promo 0.0 1.0 1.0 0.155231 0.0 0.362124 1.904152 1.625796
  • Rossmann was founded in 1972. Values of Competition_open_since_year lower than 1972 indicate the years when closest competitors, from other pharmacy chains, were opened.

  • The competition_distance variable has high positive values of skew and kurtosis, indicating that the distribution is skewed right and has a large tail.

  • There is a vast difference in the range in some features. Higher ranging numbers have superiority of some sort. So these more significant number starts playing a more decisive role while training some models. It's needed to apply some sort of scaling over the features.

Mind Map Hypothesis¶

No description has been provided for this image

Exploratory Analysis Hypotheses¶

H1. On average, stores with a larger assortment should sell more.

H2. Stores with closer competitors should sell less.

H3. Stores with competitors that have been around for longer should sell more.

H4. Stores with more consecutive promotions should sell more than stores with regular promotion

H5. Stores open during the Christmas holidays should sell more.

H6. Stores should sell more over the years.

H7. Stores should sell more in the second half of the year

H8. Stores should sell less after the 10th of each month.

H9. Stores should sell more on average on weekends.

H10. Stores should sell less during school holidays.

Exploratory Data Analysis¶

Response Variable¶

No description has been provided for this image

Numerical Variables¶

/tmp/ipykernel_50303/2232118446.py:6: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared.
  num_attributes.hist(bins=25, ax=ax);
No description has been provided for this image

Categorical Variables¶

No description has been provided for this image

Hypothesis Validation¶

H1. On average, stores with a larger assortment should sell more.¶

True stores with a larger assortment sell more on average

No description has been provided for this image

H4. Stores with more consecutive promotions should sell more than stores with regular promotion¶

Falso Stores with more consecutive promotions sell less

No description has been provided for this image

H9. Stores should sell more on average on weekends.¶

False, there is not enough evidence to conclude that sales on weekends are greater than sales on weekdays.

No description has been provided for this image

Summary of Hypotheses¶

Hypotheses    Conclusion    Relevance
------------  ------------  -----------
H1            True          High
H2            False         Low
H3            False         Low
H4            False         High
H5            True          High
H6            True          Low
H7            True          High
H8            True          High
H9            False         High
H10           False         High

Multivariate Analysis¶

Numerical Attributes¶

No description has been provided for this image

Categorical Attributes¶

No description has been provided for this image

Machine Learning Modelling¶

Compare Model's Performance¶

Model Name MAE MAPE RMSE
0 XGB Regressor 694.066535 0.102831 999.914943
1 Random Forest Regressor 747.458229 0.111702 1098.595402
2 Average Model 1429.763326 0.216814 1939.328730
3 Linear Regression 1867.623495 0.296267 2657.022835
4 Linear Regression - Lasso 2192.664126 0.343490 3092.842416

4. Conclusion and Demonstration¶

Business Performance¶

store predictions worst_scenario best_scenario MAE MAPE
291 292 108,383.78 105,018.11 111,749.45 3,365.67 0.61
908 909 244,502.50 237,037.94 251,967.06 7,464.56 0.52
875 876 200,492.30 196,658.64 204,325.96 3,833.66 0.29
721 722 356,656.97 354,564.33 358,749.61 2,092.64 0.28
594 595 383,771.44 379,840.58 387,702.29 3,930.85 0.27
... ... ... ... ... ... ...
493 494 326,901.41 326,470.16 327,332.65 431.25 0.06
373 374 258,871.70 258,486.16 259,257.24 385.54 0.06
561 562 747,947.50 746,976.29 748,918.71 971.21 0.06
958 959 255,862.62 255,458.45 256,266.80 404.17 0.05
259 260 232,898.20 232,580.29 233,216.12 317.91 0.05

1115 rows × 6 columns

No description has been provided for this image

Total Performance¶

Scenario Values
0 predictions US$290,943,413.95
1 worst_scenario US$290,165,597.86
2 best_scenario US$291,721,230.03

Machine Learning Performance¶

No description has been provided for this image

Demonstration - Telegram¶

No description has been provided for this image

5. Next Steps¶

  • Model Workshop for Business Users
  • Collect Usability Feedback
  • Improve model performance (MAPE) by 5%

Q & A¶

Thank You Very Much!¶